NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Real-Time Access Control for Background and Co-Occurrence Image Privacy Protection

https://doi.org/10.1109/TETC.2025.3572396

Cai, Chaoquan; Lin, Dan; Palaniappan, Kannappan; Clifton, Chris (January 2025, IEEE Transactions on Emerging Topics in Computing)

Full Text Available
On Improving Fairness of AI Models with Synthetic Minority Oversampling Techniques

Zhou, Yan; Kantarcioglu, Murat; Clifton, Chris (April 2023, Society for Industrial and Applied Mathematics)

Biased AI models result in unfair decisions. In response, a number of algorithmic solutions have been engineered to mitigate bias, among which the Synthetic Minority Oversampling Technique (SMOTE) has been studied, to an extent. Although the SMOTE technique and its variants have great potentials to help improve fairness, there is little theoretical justification for its success. In addition, formal error and fairness bounds are not clearly given. This paper attempts to address both issues. We prove and demonstrate that synthetic data generated by oversampling underrepresented groups can mitigate algorithmic bias in AI models, while keeping the predictive errors bounded. We further compare this technique to the existing state-of-the-art fair AI techniques on five datasets using a variety of fairness metrics. We show that this approach can effectively improve fairness even when there is a significant amount of label and selection bias, regardless of the baseline AI algorithm.
more » « less
Full Text Available
Unfair AI: It Isn’t Just Biased Data

https://doi.org/10.1109/ICDM54844.2022.00114

Haider, Chowdhury_Mohammad Rakin; Clifton, Chris; Zhou, Yan (November 2022, IEEE)

Conventional wisdom holds that discrimination in machine learning is a result of historical discrimination: biased training data leads to biased models. We show that the reality is more nuanced; machine learning can be expected to induce types of bias not found in the training data. In particular, if different groups have different optimal models, and the optimal model for one group has higher accuracy, the optimal accuracy joint model will induce disparate impact even when the training data does not display disparate impact. We argue that due to systemic bias, this is a likely situation, and simply ensuring training data appears unbiased is insufficient to ensure fair machine learning.
more » « less
Full Text Available
A Roadmap for Greater Public Use of Privacy-Sensitive Government Data: Workshop Report

https://doi.org/10.48550/arXiv.2208.01636

Clifton, Chris; Malin, Bradley; Oganian, Anna; Raskar, Ramesh; Sharma, Vivek (June 2022, ArXivorg)

Government agencies collect and manage a wide range of ever-growing datasets. While such data has the potential to support research and evidence-based policy making, there are concerns that the dissemination of such data could infringe upon the privacy of the individuals (or organizations) from whom such data was collected. To appraise the current state of data sharing, as well as learn about opportunities for stimulating such sharing at a faster pace, a virtual workshop was held on May 21st and 26th, 2021, sponsored by the National Science Foundation and National Institute of Standards and Technologies, where a multinational collection of researchers and practitioners were brought together to discuss their experiences and learn about recently developed technologies for managing privacy while sharing data. The workshop specifically focused on challenges and successes in government data sharing at various levels. The first day focused on successful examples of new technology applied to sharing of public data, including formal privacy techniques, synthetic data, and cryptographic approaches. Day two emphasized brainstorming sessions on some of the challenges and directions to address them.
more » « less
Full Text Available
Semantic Integration in Heterogeneous Databases Using Neural Networks

Li, Wen-Syan; Clifton, Chris (September 1994, Proceedings of the 20th International Conference on Very Large Data Bases)

One important step in integrating heterogeneous databases is matching equivalent attributes: Determining which fields in two databases refer to the same data. The meaning of information may be embodied within a database model, a conceptual schema, application programs, or data contents. Integration involves extracting semantics, expressing them as metadata, and matching semantically equivalent data elements. We present a procedure using a classifier to categorize attributes according to their field specifications and data values, then train a neural network to recognize similar attributes. In our technique, the knowledge of how to match equivalent data elements is "discovered" from metadata, not "pre-programmed".
more » « less
Full Text Available

Search for: All records